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The influence of the network's structure on the dynamics of spreading processes has been exten- 
sively studied in the last decade. Important results that partially answer this question show a weak 
connection between the macroscopic behavior of these processes and specific structural properties 
in the network, such as the largest eigenvalue of a topology related matrix. However, little is known 
about the direct influence of the network topology on microscopic level, such as the influence of the 
(neighboring) network on the probability of a particular node's infection. To answer this question, 
we derive both an upper and a lower bound for the probability that a particular node is infective in 
a susceptible-infective-susceptible model for two cases of spreading processes: reactive and contact 
processes. The bounds are derived by considering the n— hop neighborhood of the node; the bounds 
are tighter as one uses a larger n— hop neighborhood to calculate them. Consequently, using local 
information for different neighborhood sizes, we assess the extent to which the topology influences 
the spreading process, thus providing also a strong macroscopic connection between the former and 
the latter. Our findings are complemented by numerical results for a real-world e-mail network. A 
very good estimate for the infection density p is obtained using only 2-hop neighborhoods which 
account for 0.4% of the entire network topology on average. 



PACS numbers: 89.75.Hc, 02.50.Ey, 87.19.X- 
I. INTRODUCTION 

Complex network theory has opened the way for ex- 
ploring many dynamical processes on large-scale systems 
consisting of individual components connected in a non- 
trivial topology. One of the most widely studied phenom- 
ena occurring on complex networks are spreading pro- 
cesses, with a prominent example attracting widespread 
attention being the spread of viruses in social or com- 
puter networks [THS]. 

There are several approaches being used in the analy- 
sis of epidemic spreading. One popular approach is the 
heterogeneous mean-field (HMF) prescription by coarse- 
graining nodes within degree classes and relaxing the 
problem by assuming that all nodes in a degree class 
have the same dynamical properties [2 [Sj [71 lS] . How- 
ever, it has been shown that HMF can result in different 
levels of accuracy [3]. A more successful approach in 
determining the outcome of an infection was introduced 
by Chakrabarti et.al [10] where the SIS epidemic model 
was analyzed by using a system of probability equations, 
which in fact, represents a deterministic non-linear dy- 
namical system (NLDS). This approach was also used in 
[llj , where a family of SIS epidemic models is examined, 
parameterized by the number of stochastic contact trials 
per unit time, that range from contact processes (where 
the contagion expands at a certain rate from an infective 
vertex to one neighbor at a time) to reactive processes 
(in which an infective individual effectively contacts all 



its neighbors to expand the epidemics). Using a deter- 
ministic model, referred to as the Microscopic Markov- 
Chain approach (MMCA), which is virtually equivalent 
to NLDS, the whole phase diagram of the different infec- 
tion models is constructed and their critical properties 
are determined. It is worth noting that using different 
number of stochastic contagion per unit time extends the 
usability of the model, since this number can surely vary 
for different real- world problems [12,. Recently, a mixed 
approach using both NLDS and HMF was proposed in 
|13j which lead to a nonperturbative formulation enhanc- 
ing the predictive power of the classical HMF approach. 
Heterogeneous environments have also been extensively 
studied. One such is an epidemic model with inhomoge- 
neous infection probabilities on a graph with prescribed 
degree distribution [T^ where model's dynamics are de- 
rived for i.i.d. weights and for weights that are functions 
of the degrees. 

With the help of these theoretical frameworks, the 
role of network topology in the spreading process has 
been repeatedly emphasized, yielding the result of a fi- 
nite threshold for the spreading process in networks with 
exponentially bounded degree distributions, and a van- 
ishing threshold in infinite uncorrelated networks with a 
power-law degree distribution. A recent addition to these 
findings is that for the SIS epidemic model, the vanishing 
threshold has nothing to do with the scale-free nature of 
the degree distribution, but is the result of the largest 
hub being a self-sustainable source for the infection [13] 
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(see also [TB]). However, the currently established con- 
nections are very rough with a topology-related thresh- 
old differentiating between two extreme outcomes of the 
model. With the threshold being satisfied, there is still a 
large spectrum for different spreading parameters and the 
poorly understood role of the network's topology there 
motivated our work. 

In this paper we adopt the approach proposed in 
[TUl [TT] and study the deterministic epidemic model on 
graphs, in which the dynamics of individual nodes is 
described by a discrete-time Markov chain. In the SIS 
model, a node can be in one of two states: susceptible 
(S) or infective (I). Infective nodes can infect other neigh- 
bouring nodes, and each node can be randomly cured 
with probability S per unit time. At each time step, an 
infective node makes a number of trials per unit time to 
transmit the disease to its neighbours with probability f3. 
We consider two specific cases: (i) the contact process, 
which involves a single stochastic contagion per infective 
node per unit time, and (ii) the reactive process, which 
involves as many stochastic contagions per unit time as 
neighbours a node has. The work in this paper extends 
that of [TOl Hi] . We derive upper and lower bounds on 
the probability of a node to be infective, and determine 
how tight the bounds are around the probability that a 
node is infective. For both processes the bounds are de- 
rived using the n— hop neighbourhood of each node. The 
larger the considered neighbourhood - the more topolog- 
ical information one uses to determine the bounds, hence 
the bounds are tighter. We use the difference between 
the upper and lower bound averaged over all nodes to 
determine the influence of the network topology on the 
spreading process and compute numerical results for a 
real-world e-mail network. For additional clarity. Figure 
[l] depicts the 1-hop and 2-hop neighborhood of a partic- 
ular node in the Enron e-mail network with degree 10, 
together with the calculated bounds for its probability 
of infection derived using only its respective subgraph 
information. 

The outline of the paper is the following. Section |ll] 
gives the definition of the model and recovers known re- 
sults. The contributions of this paper are contained in 



sections III and IV In section III the upper and lower 



bounds on the probability of being infective are derived 
for the reactive process, and numerical results for the e- 
mail network are presented. Section |IV] gives the bounds 
for the contact process, along with the corresponding nu- 
merical results. Section[V|concludes the paper and points 
out future research directions. 



II. MODEL DEFINITION AND ANALYSIS 

Consider a closed population of N individuals, con- 
nected in a network structure which is represented by 
a simple, undirected, unweighted, connected and uni- 
partite graph G = {V,E) with node set V and edge 
set E. The adjacency matrix of the graph is given by 




(a)l-hop neighborhood 




(b)2-hop neighborhood 



FIG. 1. 1-hop and 2-hop neighborhood for a node extracted 
from a real-world e-mail network with 33696 nodes. The node 
(largest in size) has 10 direct neighbors (medium sized). The 
probability of infection for the given node obtained after sim- 
ulating a particular configuration of the SIS model was 0.373. 
(a) The 1-hop neighborhood represents a tree with the given 
node at its center and 10 peripheral nodes. The probability of 
infection for the node given the 1-hop neighborhood (node's 
degree) is calculated to be between 0.006 and 0.503. (b) The 
2-hop neighborhood contains 62 nodes and 92 edges. Periph- 
eral nodes are smallest in size and are two hops away from the 
central node. The probability of infection for the node given 
the 2-hop neighborhood topology is calculated to be between 
0.297 and 0.416. Note that the difference between the up- 
per and lower bound gets smaller as we use more topology 
information. 



-A = [aij]NxN, where atj = 1 if node i is connected to 
node J, and = otherwise. Each node can be in 
one of two possible states: susceptible (S) and infective 
(I). Susceptible nodes are healthy and can contract the 
disease upon contact with infective nodes, which spread 
the disease. After the infectious period of the disease has 
ended, a node becomes susceptible to the disease once 
again. The initial set of infective nodes at time is as- 
sumed to be non-empty, and all other nodes are assumed 
to be in state S at time 0. 

The state of a node is represented by a status vector, 
an indicator vector containing a single 1 in the position 
corresponding to the present state, and in the other 
Si(i) = [sf{t) si {t)f, for aU i € {1, . . . , N}. Let p,(t) = 
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[pf{t) pl{t)]'^ be the probability mass function (PMF) 
of node i at time t. The evolution of SIS is described by 
the following equations: 



pf{t 

piit- 



1) ^ .sfm~,m)+ssi{t) 

1) = ^sf{t)Mt) + {l-S)si{t) 



(1) 



and 



Sj(t + 1) = MultiRealize[pi{t + 1)], 



(2) 

where MultiRealize[-] performs a random realization for 
the PMF given with p,(t + 1). In < 5 < 1 is the 
probability of curing and < /3 < 1 is the probability 
of disease transmission from an infective to a susceptible 
node. 

We consider two cases of infection spreading: the con- 
tact process and the reactive process. The contact pro- 
cess [T7l - [l9] is a dynamical process that involves a sin- 
gle stochastic contagion per infective node per unit time, 
while in the reactive process pUH^ there are as many 
stochastic contagions per unit time as there are neigh- 
bours to a node. The distinction between the two pro- 
cesses is reflected in the probability fi{t) that a suscep- 
tible node i receives the infection from any combination 
of its infective neighbours. The probability fi{t) has the 
form: 



N 



(3) 



where Vij is a contact probability. Without loss of gen- 
erality, it is instructive to think of these probabilities 
as the transition probabilities of random walkers on the 
network. The general case is represented by Xi random 
walkers leaving node i at each time step: 



The contact process corresponds to a model dynamics 
of one contact per unit time, = 1, Vi, thus r^j = 
Uij / aij. In the reactive process all neighbors are con- 
tacted, which corresponds, in this description, to set the 
limit A; — > oo, Vz, resulting on rij = aij. 

Though exact and realistic, the system of equations 
([T]) is not suitable for the analytical study of the system 
dynamics, since the new statuses are obtained as a result 
of a decision process, transforming a continuous variable 
into a discrete one. That is why, in the further text, 
we will complement the status dependent system, with 
a set of adequate probability equations. This approach 
was introduced by Chakrabarti et.al [lOJ, who analyzed 
the infection in the network using a system of probabil- 
ity equations, referred to as the Non-Linear Dynamical 
System (NLDS) model. Adopting their approach to the 
SIS process ([T]) , we obtain the following set of difference 
equations for the probabilities of states S and I: 



pf{t+i) = pfm^Mt))+spiit) 

Pi{t + 1) = pf{t)Mt) + {l-6)pi{t) 



(4) 



where fi(t) is now 



N 



fin,p'^{t)). 



(5) 



Note that Q is a deterministic equation. 

Since pf{t) + pj (t) = 1 for all i and all t, we rewrite 



m using = pI 



x,it + 1) = (1 - x,{t))f,{t) + (1 - S)x,{t). 



(6) 



Equation Q represents a nonlinear dynamical system 
F : [0, [0,1]^. The system M has two fixed 
points: the origin x,; = 0,Vj G {1, . . . ,N} and let x*{G) 
be the fixed point of ^ different from the origin for the 
graph G. We will write only x* instead of x*{G) when it 
is clear which graph G is considered in the context. At 
the stationary state: 



Sx*^{l-x*) 



N 



- I3r,,x*) 



(7) 



The origin Xi = 0, V« S {!,..., N} is a fixed point of 
the system. Using the Jacobian matrix of the system ^ 
evaluated at the origin: 

where R = [r^jTYxAr, one finds the well-known result |101 
ITT] that the origin is stable when 



5 ^l.R 



(8) 



where Xi^b, is the largest eigenvalue of the matrix R. 
Whenever the infection to cure ratio /3/(5 is greater than 
the network threshold l/Xi^n the disease will reach an 
endemic state in the network. For a contact process 
Ai.i^ = 1, since i? is a row stochastic matrix, while for 
a reactive process Xi^n = Ai_^. Moreover, when /3 ^ 0, 
(5 7^ 0, and 5^1, the ergodicity of the Markov chains 
describing the SIS dynamics of each node is guaranteed 
and therefore Q has a unique globally stable fixed point. 
Therefore, there exists a critical value of /3, /3c = S/Xi^ji, 
such that the origin is a globally asymptotically stable 
fixed point of (|6| if /3 < /3c, and X* for all « is a globally 
asymptotically stable fixed point of Q when [3 > j3c- 



III. REACTIVE PROCESS 



Upper bounds on the probability of being 
infective 



In this section we consider a family <& of all possible 
simple and connected graphs with at least two nodes (we 
exclude from this family the empty graph and the graph 
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with a single node and no links) and SIS reactive pro- 
cesses on this family for which the stationary solution 
different fr om the origin, is an asymptotically sta- 
ble fixed point of For the reactive process, since 
ra = an, we rewrite fjl as: 



i-nf=i(i-/3«»j-^;) 



l-n^i(l-/3a«j2;*) 



(9) 



Our first observation which acted as a building block 
for deriving the bounds for contact and reactive processes 
was that the stationary probability of infection of the 
reactive model ^ for all nodes i, is bounded by 



(10) 



This is formally stated in lemma pV.l| in Appendix [K\ 
Note that the bound ( 10 ) is independent of the specific 



network topology. Its right-hand side corresponds to the 
stationary solution Q for an infinitely large full-mesh 
graph. Bound ( 10 1 is rough and uses no information 
about the topology. A better bound can be obtained 
if one considers the degree of node i; in this case, we 
have: 



1 - 



1 - 



l+S 



ki 



1 - 



1 - 



l+S 



(11) 



where ki is the degree of node i. In general, one can 
find progressively better bounds for x* by using more 
information on the graph topology. In fact, let 



i-atr(i 

i-nLd 



/3a.,<-i) 



Q n- 



(12) 



where 



«°-l/(l + 5). 



Then x* is bounded by 



X* < ... <u" < ... <u- 



(13) 



for all i. For a formal definition and proof see theorem 



A.2| in Appendix |A) 

Using the similar arguments as in the proof of the the- 
orem ( A.2), it can be shown that lim„_>.oo = x* for all 
i. In this paper we are interested only for small n. A sim- 



ilar theorem to the theorem (A.2 ) can also be proved for 



lower bounds but only for those SIS processes for which 
/3 > 6. The obvious lower bound is x* > 0, but replacing 
in a recurrent relation similar to the one in (12 1 will 
produce only Os. Appendix [A| cont ains the theorem (A. 3) 
for lower bounds of x* : 



L" < L} < ...< < 



< X, 



(14) 



for all i, which is analogous to theorem (A.2 1, and the 
bounds L" are defined as 



i-nf=i(i-/3a.,ir') 



i-nf=i(i-/3«..ir')+'5 



where 



1 - s/p. 



Note that the left-hand side of ( 14 ) is defined only for 



(3 > S, since x* > 0. This property comes from ([8|) since 
the graph associated with — x*^^^ is a path graph of 
size 2 with \i c = 1. i" for all n are also going to be 
defined only for (5 > S, since the a priori assumption is 
that the peripheral nodes have no probability of being 
infected. In order to obtain bounds for /3 < S, we take a 
different approach described in the following subsection. 



Lower bounds on the probability of being 
infective 



In the previous section we have derived upper bounds 
which are valid for all (5 and S and lower bounds valid 
only for the SIS processes for which f3 > S. Since this is a 
restriction, in this section we find lower bounds valid for 
all /3 and S by observing that if G' — {V',E') is a sub- 
graph of G = {V, E), with X* and x* being the stationary 
solution of ([9]) for the graph G and G' respectively for an 
arbitrary node i G VCiV, then x* < x*. In other words, 
as we remove edges (and nodes) from a graph, the proba- 
bility of infection will decrease for each (remained) node. 
This is stated formally in lemma [Ar4| in Appendix [X] Us- 
ing this interesting property, we can derive lower bound 
for an arbitrary node by simply obtaining (numerically) 
the stationary solution of ([9j) for the 1-hop neighborhood 
starting at node i. Then x* is a lower bound for x*: 



where 



I 



1 - 



(15) 



1 - 



I3x' +5 



(16) 



is the stationary solution of ^ for the hub (central node) 
of a star graph G' with fc^ -I- 1 nodes, ki being the degree 
of node i. 



Note that bound (15) unlike bound (10) uses 1-hop 
topology information (the degree of the node) a priori, 
thus avoiding the problem when [3 < 5. Consequently, 
one can find progressively better lower bounds for node 
i by solving (|9| for different subgraphs of G. 

To show this, we now define a class of subgraphs called 
a p-hop neighborhood. Let i be an arbitrary node of 
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is the entire graph G, 



takes into account the topol- 



b) 



FIG. 2. b) and c) 1-hop and 2-hop neighborhood of the gray 
node extracted from the graph in a) 



the graph G = {V^E), i G V, and let rii = maxxl{i,x) 
where is the length of the shortest path between 

nodes i and j. Let V^^ = {i}. We define a subgraph 
= (V;^, Ef ) of G {V, E) as follows: 

Vl = {x\x e V,0 < l{i,x) <p) 

E^ = {{x,v)\{x,y)GE,xGVr,yGVr'}, 

where p = l,...,ni + l. We say that G^is a p-hop 
neighborhood of node i & V. (see Figure [2]). For ex- 
ample, Gj = {{i} U Vi,Ei), where Ei is the set of edges 
adjacent to node i, and Vi is the set of all neighbors of 
i. In fact, G] is a star graph with ki leaves and root i. 
Note that G"^'^^ is the entire graph G and that the first 
triangle can occur in Gf but not in Gj. 

Finally, if If is the probability of infection of node i 



given its p-hop neighborhood, by theorem A. 5 proven in 
Appendix |Aj the probability of infection x* given the 
entire graph G is bounded by 



1} <ll 



< 



(17) 



where + 1 is such that E"'~^^ = E, i.e. the + 1-hop 
neighborhood of node i contains the entire graph G. 

C. Numerical results 

In the previous section we have proved that 

If < X* < u? 



ogy of the whole network. Therefore, it makes sense to 
calculate the difference df — uf^—lf^, when n = p, and for 
small values of p. In this way, one could, at least numer- 
ically, answer one of the basic questions in mathematical 
epidemiology for any graph: what is the influence of the 
graph topology on disease spreading, or more precisely, 
on the probability that given node will be infected? 

Lower bounds are derived as stationary solution of the 
SIS process for the corresponding subgraphs. On the 
other hand, upper bounds are found by back-propagation 
using the equation ( 1 2 1 . As a consequence, when p — + 
1, 1^'+^ = X* whilelr'+^ > X* and thus > 0. In 

fact, see remark 3.3, only when lim„_j.oo (wf ) — = Q. 

In this section we study the Enron e-mail network ob- 
tained from |23j , running ([6]) on the network. The Enron 
e-mail network has 33696 nodes and 361622 edges with 
Ai,A = 118.4177, and /3c = 0.004222 when S = 0.5 for the 
reactive process. We study the upper and lower bounds 
of the expected density of infection p = J^i 1^ calcu- 
lated as 



N 

i=l 

N 

1=1 



N 



N 



for different values of p, as well as the average difference 
App between the upper and lower bound for all nodes. 



N 



N' 



We also calculate df for 3 nodes: the node with minimum 
degree, the node with maximum degree, and a node with 
average degree. To have a better idea of how much local 
information is being used. Table |T] depicts the size of a 
p-hop neighborhood for the Enron e-mail network, \EP\ , 
as measured by the number of edges in the corresponding 
subgraph averaged over all nodes i, as well as its fraction 
of the total number of edges in the network. 



ioi i — 1, ... N, p — 1, .. .rii + 1, and n ^ 1,2,.... Note 
that only when p = iii + 1, If = x*; otherwise If < x*. 
The bounds 1} and uj are obtained by considering only 
(first) neighbors of i. The bound uj depends on the de- 
gree of the node i, that is, the information contained in 
the 1-hop neighborhood of G extracted by starting at 
node 2, while for the bound Ij one computes the SIS 
model on the subgraph G], which is the subgraph of 
neighbors of z. In a similar fashion, the bounds If and uf 
are obtained by considering second neighbors of i (neigh- 
bors of the first neighbors). The bound uf can be com- 
puted by using for all neighbors j of i. Thus, uf re- 
fiects the topology of 2-hop neighborhood of G extracted 
by starting at node i. Finally, for n = Ui + l, since G"*'*'^ 



TABLE I. Average size of p-hop neighborhood for the Enron 
e-mail network. \E^\ is an average of \E^\ over all nodes i and 
\E\ is the total number of edges in the network. 



p 


\E^\ 


|£;f|/|£| 


1 


10 


0.0003 


2 


1538 


0.004 


3 


45067 


0.125 


4 


207496 


0.574 



Figures J3] and |4] summarize our results. As depicted 
in Figure [STthe bounds are surprisingly tight even when 
only 2-hop topology information is being used. More 
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FIG. 3. The density of infective nodes in the Enron e-mail 
network as the transmission parameter /3 is varied, and 5 = 
0.5, obtained by simulating Q on the network until the model 
stabilizes, along with the upper and lower bounds, pp and pp, 
on p using 1-hop and 2-hop topology information. 



precisely, we obtain a very good estimate for the infection 
density p by summing over node-level estimates which are 
using only 0.4% of the network topology on average (see 
Table |l]). The upper bound pi is also surprisingly tight 
given that it only uses the node's degrees while having 
no information for the edges in the network. 

Figurefflshows that the average difference App between 
the bounds decreases as one considers 1-hop, 2-hop, and 
3-hop topology information, as expected since the bounds 
around the stationary infection density p = x*/N be- 
come tighter. Also the spreading becomes less topology 
dependent as the disease transmission parameter l3 in- 
creases. When P is close to /3c, x* and consequently p, 
are close to zero as well. Therefore, as f3 approaches 
/3c, it is expected that the value of x* is influenced by 
the whole network. For the Enron network, as indi- 
cated in Figure |4j the average difference between the 
bounds calculated from 3-hop neighborhood is close to 
zero, Ap3 < 0.012 for all /?. Additionally, when l3 > 0.4, 
topology of only 2-hops away is relevant for the spreading 
process, Ap2 < 0.01. The bounds calculated using only 
1-hop topology, i.e. the nodes' degrees are wide apart for 
all values of /3 for the particular network, indicating that 
the specific degrees are highly influential in the spreading 
process. 

In this light, we examine the difference between the 
bounds for three randomly chosen nodes with particular 
degree: one with minimum degree, one with average de- 
gree, and one with maximum degree. Note that results 
vary greatly for the three types of nodes. The difference 

is smallest for the node with minimum degree reaching 
a maximum of 0.028 for all /3 given only 2-hop topology 
information (0.4% of the entire network topology on aver- 
age) . It is also worth noting that the lower bound is very 
close to the actual result, with the difference being due to 



the upper bound requiring more information to converge. 
Interestingly, while the gap for the node with maximum 
degree quickly decreases with the increase of /3, it can be 
large for specific values of /3. In contrast to the minimum 
degree case, for the node with maximum degree, the dif- 
ference is due to the lower bound not having converged, 
while the upper bound is quite tight. Another interest- 
ing result for the node with maximum degree is that as 
/3 gets greater than 0.04, knowing only the neighbors of 
the node's neighbors suffices for predicting the outcome 
of the infection. For the node with average degree, we 
observe two interesting results. Firstly, the difference 
exists for a relatively wide span of /3 (as in the minimum 
degree case). The other result is that for some specific 
values of /3, d^ can be relatively large (as in the maximum 
degree case). However, the difference d^ is smaller than 
0.03 for all values of /3 given the 3-hop topology informa- 
tion which constitutes approximately 12.5% of the total 
network topology on average. 

Finally, for all nodes, the bounds on the probability 
of being infective are tighter as /J — >■ 1, and looser as 
/3 — /3c. The conclusion from this is that as /3 — >■ /3c, 
network topology plays a bigger role in the dynamics of 
the spreading process. 



IV. CONTACT PROCESS 
A. Bounds on the probability of being infective 

we rewrite 



For the contact process, since r^j 
^ as: 



Efc a.. 



i-n,li(i-/3S) 



(18) 



From lemma B.l| and B.2 (see Appendix [b]) we have 
the following bounds for the stationary solution of ( 18 ) 



1-e- 



< < 1 - - = U° 



(19) 



Note that the bound ( 19 ) is independent of the specific 



network topology. Its left-hand and right-hand side cor- 
respond to the stationary solution of ( 18 ) for an infinitely 



large full-mesh graph and a path graph of size 2 respec- 
tively. 

Similarly to the reactive process, better bounds can be 
obtained if we use a node's degree: 



< 



1 - [1 - Pxpp 
1 - [\-l3xpf' +5 
1- [l-Z^ + ^l'^' 



< X* < 



1 - [l-l3 + 5r +5 



(20) 
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All nodes 
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Minimum degree 
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Average degree 
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FIG. 4. The average difference App between the upper and lower bound (top left) and the difference between the upper and 
lower bounds that use 2-hop, 3-hop and 4-hop topology information for a node i in the Enron e-mail network that has minimum 
(bottom left), average (top right) and maximum degree (bottom right) as the transmission parameter p is varied and 5 = 0.5. 



where ki is the degree of node i and Xp is the solution of 
the equation: 

1 - e-^"" 
^ ~ 1 - e-P^ + 5 

More generally, by using n-hop neighborhoods, the sta- 
tionary solution of (18) for an arbitrary node i, x*, is 
bounded by. 



n<i\<...<ri<x*<v^^<...<v\<v^ 

where 



(21) 



i-n7=i(i-/3^..r') 



i-nf=i(i-/3^..^ 



j I 



and 



i-nf=i(i-/3^..<"') 



i-aii(i-/3r,,^r')+'5 

1 - e-l^< 



and 



1 - e-i^'< + 8 



and VL^i —\ 



S 



For a formal definition and proof see theorem B.3 in Ap- 
pendix |B] Note that here, unlike in the reactive process, 
the problem with — 1 — 6/13 when (5 > /3 is avoided 
since Br > S. 



B. Numerical results 

Again, it makes sense to calculate the difference d" = 
— between the upper and lower bound derived by 
using n-hop topology information for node i to determine 
the dependence of the contact process on the specific net- 
work topology. 

Figures [5] and [6] show the numerical results. As in 
the reactive process, we study the upper bound pp and 
lower bound pp on p, as well as the average difference 
between the upper and lower bounds App. We also study 
df for 3 nodes: the node with minimum degree, a node 
with average degree, and the node with maximum degree. 
Note that = 0.5 when 6 = 0.5. 

In general, the contact process is less dependent on 
the network topology than the reactive process, as the 
largest value of Api is an order of magnitude less than the 
corresponding value for the reactive process. The more 
topology information is included in the calculation of the 
bounds, the difference between them decreases. Also, the 
probabilities that each of the 3 examined nodes is infec- 
tive are equally dependent on the network topology, since 
d" for n = 1,2,3 are similarly valued. Contrary to the 
reactive process, the bounds on the probability of being 
infective are tighter as /3 — ?> /3c (see Figure [5]) . In the 
AppendixjC] we show that when f3 is close to /3c — S, the 
probabilities of being infective have an analytical solution 



8 



in closed form, they are no longer topology dependent, 
and are functions only of the spreading process parame- 
ters f3 and S. 




FIG. 5. The expected infection density p in the endemic state 
for the Enron e-mail network for the contact process as f5 is 
varied, and 5 — 0.5. The bounds on p calculated with no 
topology information pb, po, 1-hop topology information pi, 
pi, and 2-hop topology information p2, p2 are depicted as 
well. 



V. CONCLUSIONS 

In this paper we have derived the upper and lower 
bounds on the probability that a node is infective for 
the SIS model of infection spreading on networks, where 
the behavior of a node is modeled with a discrete-time 
Markov chain SIS model. We have considered the reac- 
tive and the contact process as two cases of the spreading 
process. For both processes we use the difference between 
upper and lower bounds on microscopic level to assess the 
dependence of the spreading processes on network topol- 
ogy. Numerical results are given on the Enron e-mail net- 
work. For both processes, the bounds are progressively 
better as one considers a larger n— hop neighborhood of a 
node. For the reactive process, both bounds on the prob- 
ability that a node is infective are tighter as /3 — 1 and 
their difference is largest for nodes with average degree. 
Conversely, the bounds on the probability that a node is 
infective for the contact process are tighter as (3 ^ /3c. 

One of the main implications of the paper is that if /3 
is larger than its critical value (when /3 is close to /3c the 
probability of a node to be infective is anyway close to 
zero) , one can estimate the probability of being infective 
using only local information (considering only n— hop lo- 
cal topology, for small n), without knowing the whole 
network. Consequently, from this local information one 
can also estimate the density of being infective on the 
whole network, as well as assess the extend to which the 
topology affects the outcome of the infection on macro- 
scopic level. 



The results of this paper are easily extendable to other 
ergodic models (such as SIRS, for example) and are re- 
lated to all types of spreading (idea, failure, rumor) [Ml - 
regardless on the type of the spread agent. How 
these results can be extended to SIR model by consid- 
ering SIRS model and taking one of its parameters to 
approach zero (or one) so that SIRS model in this limit 
approaches SIR model is a question for further research. 



Appendix A: Bounds for reactive process 

Lemma A.l. Let <I> be the family of all possible sim- 
ple and connected graphs G — {V,E) with \V\ > 2. Let 
x*(G') = [xlx2 ■ ■ ■ be the stationary solution (Mj dif- 
ferent from the origin for the graph G £ ^ , G = (V, E), 
where N — \V\. Let a:,^j,j. = max^g^ max^ x*(G). Then 
for all i, X* is bounded by 



X, < 



1 



1 + S 



Proof. Let A be the set of neighbors of the node associ- 



ated with the value 



We will show that x* = 



for all j S A by using contradiction. Let the node asso- 



ciated with the value 
Assume that x* = x*„ 



■*max be node k, i.e., xl = x*^^^. 
for all j e A is false. Then, there 

But, 



exists at least one node z e A such that x* < x* 
this means that x^ < x'^^ix since 



dxk 

dx,. 



l3Srki 
ifk+Sr 



Yl (1 - /3rk,xj) > 



]eA\{i} 



which contradicts our first statement that xl 
Let n = |A|. From ^ and the fact that x* 
all j e A, we have: 



for 



dn 



* _ 1 (1 P^max) 

l-{l-[3x*^axT+5 

> 0, the maximum value of x*„^ is obtained 



Since 

at n — oo. Finally, the bound (10) comes directly from 

1 - (1 - Pxl^^x)' 



Xr 



lim 



1 



™ 1 - (1 - /5a^:;aJ" + -5 1 + 5 



□ 



Theorem A. 2. Let $ be the family of all possible sim- 
ple and connected graphs G — {V,E) with \V\ > 2. Let 
x*(G') = [x'^X2 ■ ■ ■ x*j^] be the stationary solution ^ dif- 
ferent from the origin and let i be an arbitrary node of 
the graph G = {V,E), ieV. Let 



i-nL(i-/9c 



nL(i 



/3a,ju]-') + S 



where 



1/(1 -f 5). 
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All nodes ^ Average degree 




0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 



/3 /3 

FIG. 6. The average difference Ap„ between the upper and lower bound (top left) and the difference between the upper and 
lower bounds that use 0-hop, 1-hop, 2-hop and 3-hop topology information for a node i in the Enron e-mail network that has 
minimum (bottom left), average (top right) and maximum degree (bottom right) as the transmission parameter /3 is varied and 
5 = 0.5. 



Then x* is bounded by 



x*<...<u1< 



for all i. 



Proof. We will first prove that u" < . . . < u^^ < by 
induction. Note that = u'^ is topology independent, 



and u 







linii. 



and since 



dul 
dki 



> we have that 



< u for all i. Now assume that 



for all i and p = 2,3, 



< u: 

du 



p-1 



holds 



1. Since ^ „'li > and 

du 



< """^ foUows that 



< u^-^ for all i. We wiU 
prov e that x* < m" in a similar fashion. From Lemma 

= for all i. Now assume 



A.l we have that 



X, < u) 



. ,n 



that X* < holds for all i and p = 1,2, 
Note that x* is w" with it"~^ replaced by the smaller x 

{x* < u]-^). Since ° * 



1. 



> it follows that x* < u" for 



all i. 



□ 



Theorem A. 3. Let $ be the family of all possible simple 
and connected graphs G = {V, E) with \V\ > 2. Let /3 > 
6. Let x*(G) = [ ] he the stationary solution 

different from the origin and let i be an arbitrary node 
of the graph G = {V,E), i eV . Let 



i-nf=i(i-/?«..ir') + 



where 



1 - 5 /p. 



< L? < 



< X, 



Then x* is bounded by 
for all i. 

Lemma A. 4. Let G' = {V , E') he a subgraph of G = 
{V^E). Let node i Cz V HV' and let x* and x* he the 
stationary solution of ^ different from the origin asso- 
ciated with the node i for the graph G and G' respectively. 
Then x* < x* . 

Proof. Without loss of generality, assume that only one 
edge e between nodes i and j is removed from G in or- 
der to obtain G". Starting from the stationary solution 
x*(G), and from node i's point of view, the edge removal 
can be interpreted as a change in Xj from x* to 0. Then, 
from Lemma |A.1| we have a negative change in Xi which 
will propagate and imply negative changes in Xk for all 
k & V' with each iteration of On the other hand, 
removing a node can be interpreted as a removal of its 
edges. □ 

Theorem A. 5. Consider an arbitrary node i of the 
graph G = {V,E) and let = iVf,Ef) be the p-hop 
neighborhood of G extracted by starting at node i. Let 
rii — Tnaxxl{i,x) and let x*(G) — 



'P\* _ 



Ifi] be the stationary solution of ^ 



x(Gr) 

different from the origin for the graphs G = (V, E) and 
G^ = {V[,Ef), respectively. Then x* is hounded by 

il<if<...<n^^+'^x* 

for all i V . 

Proof < /f for all p = 2, 3, . 



from Lemma IA. 41 



1 comes directly 

□ 
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Appendix B: Bounds for reactive process 

Lemma B.l. Let $ be the family of all possible simple 
unweighted connected graphs. Let x* (G) — [ 
be the stationary solution (18) different from the origin 
for the graph G G $. Let = mingg$ minj x* (G) and 
^max — maxcg^ maxi X* (G). Let T and A be the set of 
neighbors of the node associated with the value and 
^max respectively. Then |r| oo and |A| — 1. 



Theorem B.3. Let x*(G) = [ 1 be the sta- 

tionary solution of (18) different from the origin and let 

I" 



where 



and 



i-nf=i(i-/?^ 



Proof. Let Xi^^ be either a;^j„ or x'!^^^ and let the num- 
ber of its neighbours be n. From Lemma |A.1| we have: 



Since 



< 0, the minimum value of : 



is obtained 



at rt = |r| — ^ cxD and the maximum at n = |A| = 1 since 

□ 



the graph must be connected. 



I" 



1 



1 - e"'^^* + S 
then X* is bounded by 



and u? = 1 



5 



l^ <l} < ... <l? <x* <u? < ...< u} < u° 



for all i . 

Proof. The proof is completely analogous to that of The- 
orem |X1 □ 
Appendix C: Analytical solution for the contact 
process in the limit /? — > /3c 



Lemma B.2. Let x*(G) = [ ] be the station- 

ary solution of (18) different from the origin. Then x* is 
bounded by 



l'^ 



1 - e' 



-fix* 



1 - e-'3^* + S 



for all i. 



Proof. From Lemma B.l the bounds come directly from 
the solution of the equations 



1-1 



n— ^oo 



1-1 



+ <5 



□ 



When (3 ^ (ic (but j3 > (3^) then the probability x* 
that node i is infective is x* ~ Ei, where < <C 1, and 
from ([t]) (neglecting second order terms in e) one gets 



Se, = il-e,)(3j2''^. 



(CI) 



Let y = [si . . . Sn] and Dy = [dij] be a diagonal matrix 
such that da — 1 — Si and dij — for i ^ j. The last 
equation can be written in matrix form as 



/3 



y = DyRy, 



or 



2/ = 0. 



Assuming that 7^ for all i, the last equation re- 
duces to DyR — ^In = 0, which, since "^jfij = 1, 
has a solution Si = 1 — (5//3 for all i. Therefore, when 
P > f3c = S and the nodes' probabilities x* of being in- 
fective are small, the x*'s have an analytical solution in 
closed form, they are no longer topology dependent, and 
are functions only of the spreading process parameters /3 
and S. 
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